Accelerated DNA replication fork speed due to loss of R-loops in myelodysplastic syndromes with SF3B1 mutation

Myelodysplastic syndromes (MDS) with mutated SF3B1 gene present features including a favourable outcome distinct from MDS with mutations in other splicing factor genes SRSF2 or U2AF1. Molecular bases of these divergences are poorly understood. Here we find that SF3B1-mutated MDS show reduced R-loop formation predominating in gene bodies associated with intron retention reduction, not found in U2AF1- or SRSF2-mutated MDS. Compared to erythroblasts from SRSF2- or U2AF1-mutated patients, SF3B1-mutated erythroblasts exhibit augmented DNA synthesis, accelerated replication forks, and single-stranded DNA exposure upon differentiation. Importantly, histone deacetylase inhibition using vorinostat restores R-loop formation, slows down DNA replication forks and improves SF3B1-mutated erythroblast differentiation. In conclusion, loss of R-loops with associated DNA replication stress represents a hallmark of SF3B1-mutated MDS ineffective erythropoiesis, which could be used as a therapeutic target.

Myelodysplastic syndromes (MDS) with mutated SF3B1 gene present features including a favourable outcome distinct from MDS with mutations in other splicing factor genes SRSF2 or U2AF1.Molecular bases of these divergences are poorly understood.Here we find that SF3B1-mutated MDS show reduced R-loop formation predominating in gene bodies associated with intron retention reduction, not found in U2AF1-or SRSF2-mutated MDS.Compared to erythroblasts from SRSF2-or U2AF1-mutated patients, SF3B1-mutated erythroblasts exhibit augmented DNA synthesis, accelerated replication forks, and single-stranded DNA exposure upon differentiation.Importantly, histone deacetylase inhibition using vorinostat restores R-loop formation, slows down DNA replication forks and improves SF3B1-mutated erythroblast differentiation.In conclusion, loss of R-loops with associated DNA replication stress represents a hallmark of SF3B1-mutated MDS ineffective erythropoiesis, which could be used as a therapeutic target.
Myelodysplastic syndromes (MDS) are hematopoietic stem cell neoplasms with heterogeneous outcome and limited therapeutic options in which somatic mutation in splicing factor (SF) genes is a cardinal feature 1 .MDS with splicing factor3b subunit 1 mutation (SF3B1 MUT ) are commonly associated with bone marrow (BM) ring sideroblasts and ineffective erythropoiesis 2,3 .The response rate to erythropoiesisstimulating agents (ESA) is achieved in as many as 50-60%, of low-risk MDS, but patients with SF3B1 MUT MDS could be more often primary resistant with a trend to shorter median duration of response 4,5 .The transforming-growth factor beta family ligand trap, luspatercept has been initially approved to treat the anemia of transfusion-dependent patients with MDS with ring sideroblasts after ESA failure.Forty-five percent achieve transfusion independency with a 30-week median duration of response and patients could continue to benefit from long-term treatment [6][7][8] .Deciphering the mechanisms of anemia is needed to generate treatments.
SF3B1 mutation causes multiple alterations in mRNA processing.The use of alternative 3′ or 5′ splice site produces transcripts containing short intronic sequences that are degraded by the non-sense mediated decay (NMD) or are translated into a variant protein [9][10][11] .These splicing changes drive transcriptional reprogramming that shapes disease phenotype.For example, down-regulation of Fe-S cluster transporter ABCB7 by transcript isoform-specific degradation, and reduced translation efficiency of mitochondrial iron transporter TMEM14C, contribute to mitochondrial iron accumulation 12,13 .Overproduction of alternative and canonical transcripts of ERFE gene encoding hepcidin transcriptional repressor erythroferrone, leads to systemic iron overload 11 .SF3B1 mutation also targets mitochondrial respiration and serine synthesis pathway 14 .While mutations in serine and arginine-rich splicing factor 2 (SRSF2) and U2 small nuclear RNA auxiliary factor 1 (U2AF1), predominantly alter cassette exon 15,16 , SF3B1 MUT splicing pattern is dominated by intron retention reduction (IRR) 17 .
An increasing number of genomic alterations associated with MDS progression to acute myeloid leukemia (AML) suggests a genomic instability of stem and progenitor cells 18 .DNA damage and activation of ataxia telangiectasia and Rad3-related protein (ATR) pathway were detected in SF-mutated MDS 19,20 .More specifically, SRSF2 and U2AF1 mutations induce the formation of unscheduled RNA:DNA hybrids or R-loops, triggering DNA replication stress [19][20][21] .Mechanistically, mutant SRSF2 impairs the RNA polymerase II transcription pause release, allowing nascent RNA forming a R-loop at promoter 21 .SF3B1 has been involved in the pathways of DNA repair 22,23 .However, SF3B1 MUT MDS patients have a lower risk of AML than other MDS 24,25 suggesting that SF3B1 MUT MDS are less prone to genomic instability.
In the present study, we report that, on the contrary to SRSF2 MUT or U2AF1 MUT cells, SF3B1 MUT erythroblasts demonstrate a significant loss of R-loops.These cells endure a DNA replication stress consisting in accelerated fork progression and single-stranded (ss)DNA exposure, and correlating with increased erythroid cell proliferation and impaired differentiation.The ability of low doses of histone deacetylase inhibitor (HDACi) vorinostat to restore R-loops without DNA damage, and to improve erythroid differentiation could serve as a therapeutic approach.

Results
Reduction of intron retention correlates with transcriptomic changes of SF3B1-mutated bone marrow mononuclear cells A total of 143 subjects were enrolled in this study including 70 MDS with SF3B1 mutation, 49 SF3B1 WT MDS and 24 healthy controls (Table 1).To investigate the molecular pathways whose deregulation drives the phenotype of SF3B1 MUT -MDS, BM mononuclear cells (MNC) RNA-sequencing data available from 21 SF3B1 MUT -MDS and 6 SF3B1 WT -MDS were re-analyzed 11 .With a mean number of 87 to 97 million reads per sample, DESeq2 analysis identified 1764 differentially expressed genes (DEGs) including 812 up-and 952 down-regulated genes (log 2 fold-change (FC) > | 1 | , Benjamini-Hochberg (BH)-adjusted P value < 0.05) (Fig. 1a; Supplementary Data 1).Gene ontology (GO) enrichment analysis showed that up-and down-regulated genes were involved in several pathways such as DNA replication, DNA repair, chromatid segregation, and cell cycle checkpoint signaling (Fig. 1b).These pathways were over-represented among up-regulated genes (Supplementary Fig. 1a).Eighty genes associated with these pathways allowed the clustering of SF3B1 MUT -samples (Fig. 1c; Supplementary Data 1).Using KisSplice with a variation of percent splice in (ΔPSI) > | 0.10| and a BHadjusted P value < 0.05, we detected 3937 differential splicing events (DSEs) in SF3B1 MUT samples, including 1256 abnormal intron retention events, consisting in a majority of IRRs (n = 1027) in SF3B1 MUT -samples (Fig. 1d).IRRs were the most frequent event in SF3B1 MUT -MDS, uncommon in the SRSF2 MUT or U2AF1 MUT -MDS of a large cohort of 189 lower-risk patients (Supplementary Fig. 1b).A GO analysis of the 822 genes affected by IRR revealed their over-representation in DNA replication, DNA repair, cell cycle regulation, and mRNA splicing (Supplementary Fig. 1c).Combined DEG and DSE analyses showed that 296 DEGs targeted by 384 IRR events referred to DNA repair, DNA replication, cell cycle process and regulation of chromosome separation (Supplementary Data 1; Fig. 1e).Thus, IRR might contribute to gene expression changes of SF3B1 MUT -BM MNC.
RNA-sequencing provided an equivalent mean number of million reads per sample in basoE (116 ± 18) and polyE (97 ± 19) (t-test; P = 0.543).With a log 2 (FC) > | 1| (BH-adj P value < 0.05), a total of 675 DEGs (514 up and 161 down) separated mutant and wild-type basoE.The number of DEGs was 765 (390 up, 375 down) in polyE with an overlap of 179 genes with basoE (Fig. 2d, Fig. 2e; Supplementary Data 2).GO enrichment analysis showed that up-and down-regulated genes in SF3B1 MUT -basoE or -polyE were involved in several pathways such as DNA repair, regulation of MAP kinase cascade, ubiquitindependent protein catabolism, cellular response to DNA damage and oxygen-containing compound (Fig. 2f; Supplementary Fig. 3a).In basoE, a GeneSet Enrichment Analysis (GSEA) refined these results showing the deregulation of specific DNA repair pathways such as base excision repair, and a trend for nucleotide excision repair, but neither homologous recombination nor non-homologous end-joining or mismatch repair.Genes involved in DNA replication and G1/S phase checkpoint were also significantly deregulated (Fig. 2g; Supplementary Fig. 3b, c).The DSE profiles identified in basoE and polyE were similar to that of BM MNC (Fig. 2h).IRR represented 194/829 (23.4%)DSEs in SF3B1 MUT -basoE and 383/1182 (32.4%)DSEs in SF3B1 MUT -polyE, respectively, showing that IRR frequency increased with cell differentiation (Fig. 2i; Supplementary Data 2).Genes affected by IRR regardless of the stage of erythroid differentiation were involved in the response to DNA damage, mitotic cell cycle regulation, nucleocytoplasmic transport and in the positive regulation of histone deacetylation (Fig. 2j).As the nuclear retention or cytoplasmic degradation of IR-containing transcripts participate in the differentiation and specialization of normal erythroid cells [26][27][28] , we reasoned that IRR-transcripts detected in SF3B1 MUT -erythroblasts may reshape the transcriptome contributing to defective maturation.

SF3B1-mutated erythroid precursors demonstrate characteristic proteomic signatures
We concomitantly performed a proteomic analysis of proE, basoE and polyE with label-free quantification (LFQ).Principal component analysis showed that proE and basoE clustered together and separately from polyE.In each group, the second dimension discriminated SF3B1 MUT from SF3B1 WT -samples (Supplementary Fig. 4a).In subsequent analyses, proE and basoE were grouped.The mean number of identified proteins with LFQ values per sample was 4231 ( ± 263), without significant difference between SF3B1 MUT and SF3B1 WT -samples, or between proE/basoE and polyE samples of each genetic group.A total of 443 and 290 differentially expressed proteins were detected between SF3B1 MUT and SF3B1 WT -samples in proE/basoE and in polyE, respectively (t-test, Log 2 (LFQ intensity)>|0.20|, P value < 0.05) (Fig. 2k; Supplementary Fig. 4b, c).

SF3B1 mutation induces a significant loss of R-loops in erythroid cells
RNA splicing may attenuate the probability of forming R-loops by reducing the homology between nascent RNAs and their DNA templates and/or by recruiting splicing factor that antagonize RNA:DNA hybrid formation [29][30][31][32][33] .We examined the profiles of R-loops genomewide in human primary basoE by performing DNA-RNA immunoprecipitation (DRIP) using S9.6 antibody followed by sequencing in 5 SF3B1 MUT , 6 SF3B1 WT including 3 with low variant allele frequency, 3 with high risk-mutations (1 SRSF2 mutation, 1 SRSF2 and bi-allelic TET2 (biTET2) co-mutations, 1 NRAS and biTET2 co-mutations) and 4 controls.DRIP specificity was assessed by a pre-treatment with RNase H1 (RNH1).Stringent calling identified true-positive peaks (BH-adj P value < 0.05) and we considered shared peaks between samples of each group.The number of shared peaks in SF3B1 MUT erythroblasts was significantly lower than SF3B1 WT or control cells (Fig. 3a).Visualization of R-loop profiles of a 50 kb region on chromosome 7 demonstrated the overall reduction of the peaks in SF3B1 MUT sample compared to SF3B1 WT and controls (Fig. 3b).We then compared the localization of the peaks to gene features between SF3B1 MUT and SF3B1 WT cells.In SF3B1 MUT , the proportion of R-loops decreased at 5'UTR, promotertranscription start site (TSS) and gene body (Fig. 3c).R-loops remaining in SF3B1 MUT cells, were more frequently detected in intergenic regions and at 3'UTR of the genes (Fig. 3d).As an example, the SUZ2 gene exhibited a reduced R-loop near its promoter in a SF3B1 MUTsample (Fig. 3e).By contrast, at mitochondrial DNA loci, rDNA repeats, or at some loci like CPNE7 gene, peak intensity was found elevated in SF3B1 MUT -samples showing that their R-loop profiles were selectively changed (Supplementary Fig. 5a).R-loops assemble dynamically at TSS and TTS where they contribute to the regulation of transcription [34][35][36] .Thus, we compared the mRNA level of genes overlapping shared R-loops at TSS, gene body, TTS and 3'UTR in SF3B1 WT and SF3B1 MUT -samples.The expression of genes in which R-loops were detected at TTS or 3'UTR in SF3B1 WT samples was similar in SF3B1 WT and SF3B1 MUT -cells while the expression of genes in which R-loops were detected at gene body in SF3B1 WTsamples increased in SF3B1 MUT -samples.By contrast, the expression of genes in which R-loops were detected at TSS in SF3B1 WT but not in SF3B1 MUT samples dramatically decreased in SF3B1 MUT -samples (Fig. 3f).
To look for differential peaks between the groups, the count of restriction fragments overlapping with significant peaks was normalized using DESeq2.We confirmed a global decrease in the number of R-loops in SF3B1 MUT versus controls or SF3B1 WT basoE.In these analyses, the number of differential peaks (with log 2 (FC) > | 1| and BH-adj P value < 0.05), was 4589 between SF3B1 MUT and controls with only 52 up Fig. 1 | Intron retention reduction correlates with transcriptomic changes of human SF3B1 MUT bone marrow mononuclear cells.RNA-sequencing data of 21 SF3B1 MUT and 6 SF3B1 WT (4 SRSF2 MUT and 2 SF WT ) bone marrow mononuclear cell samples were re-analyzed.a Volcano plot showing 1764 up or down-regulated genes in SF3B1 MUT samples (Log 2 (FC) <|1 | ; Two-sided Wald test and Benjamini-Hochberg (BH)-adjusted P value < 0.05).b Gene Ontology (GO) enrichment analysis of the up and downregulated genes showing significantly enriched terms according to -log10(adjusted P value).Fisher's exact test corrected by false discovery rate (FDR) < 0.05.Terms of interest are in bold.c Heatmap representing the clustering of samples by the variations of expression of a subset of 80 genes belonging to GO terms highlighted in (b).Genes affected by 1 to 8 differential splicing events are marked with an asterisk.d Barplot representing the number and types of differential splicing events in SF3B1 MUT in comparison to SF3B1 WT with ΔPSI > | 0.10| using two-sided Wald test and BH-adjusted P value < 0.05.The bars over 0 indicate the events upregulated in mutant cases and the bars under 0 indicate the events downregulated in wild type cases.e GO over-representation analysis of 296 significantly deregulated genes affected by 383 intron retention reductions.Fisher's exact test corrected by FDR < 0.05.FC: fold-change.Source data are provided as a Source Data file. in SF3B1 MUT samples, and 19,394 between SF3B1 MUT and SF3B1 WT samples, of which only 33 were up-regulated in SF3B1 MUT cells (Fig. 3g; Supplementary Data 4).In comparison to controls, 53% and 20% of the peaks lost in SF3B1 MUT cells were located in gene bodies (mostly centered on intronic sequences), and promoter-TSS, respectively (Supplementary Fig. 5b).Similarly, in comparison to SF3B1 WT cells, 63% and 12% of the peaks lost in SF3B1 MUT cells were located in gene bodies and promoter-TSS (Fig. 3h).In this latter comparison, we identified 7,039 unique genes affected by losses of R-loops either in their UTRs, TSS, TTS or gene bodies, which overlapped 345/498 (69%) of the recurrent IRRs shared by SF3B1 MUT samples (Fig. 3i).For example, at RAD9A and IQGAP3 loci (Fig. 3j) or at COASY locus (Supplementary Fig. 5c), all occupied by an IRR, R-loops were significantly reduced in SF3B1 MUT cells.To validate these findings, we selected two genes (ABCC5 and TCIRG1) with IRR and two genes (IREB2 and TMX2) without IRR in SF3B1 MUT -erythroblasts.We performed DRIP-qPCR in basoE generated from 4 SF3B1 MUT MDS, 3 SF3B1 WT MDS including one MDS with U2AF1 mutation, and 4 healthy donors.Positive controls for R-loops detection were RPL13A and TFPT genes and for each sample, the specificity of the signal was assessed by RNH1 pre-treatment (Supplementary Fig. 6a, b).R-loops detected at ABCC5 and TCIRG1 loci in healthy donor and SF3B1 WT cells were not or hardly detected in SF3B1 MUT -samples.By contrast, at IREB2 and TMX2 loci, the enrichment signal was faint, suggesting the absence of R-loop, whatever the sample (Fig. 3k).
Altogether, these results validate a link between SF3B1 mutation, decreased R-loop formation, intron retention reduction and deregulated gene expression.

SF3B1 mutation promotes a DNA replication stress in human erythroblasts
Since the accumulation of R-loops is reported to slow down the DNA replication fork velocity and cell proliferation 37 , loss of R-loops in SF3B1 MUT erythroblasts may reduce obstacles to fork progression, promote DNA replication and cell proliferation.To explore this, we expanded erythroblasts from CD34 + HSPCs of 5 SF3B1 MUT , 3 SRSF2 MUT or U2AF1 MUT , 3 SF WT -MDS and 4 healthy donors.Flow cytometry analysis showed significant increase in the BrdU intensity of S-phase cells (Fig. 4a, b) and percentage of S-phase cells (Fig. 4c) in SF3B1 MUT -basoE.
Altogether, these results show that accelerated fork velocity was observed when R-loops were lost in SF3B1 MUT -erythroblasts.Exposure of ssDNA without evidence for DNA damage marked by γH 2 AX/53BP1 indicates that SF3B1 MUT -erythroblasts endured a milder replication stress than SRSF2 MUT and U2AF1 MUT -erythroblasts.
To address the functional consequences of these dysregulations, we compared the proliferation of the G1E-ER4 clones.Before GATA1 induction, the proliferation of Sf3b1 K700E/+ proerythroblasts was significantly higher than that of Sf3b1 +/+ cells and remained higher upon induction (P < 0.0001, Fig. 5d).The differentiation to basoE was significantly lower in Sf3b1 K700E/+ cells (Fig. 5e, f).In accordance with this, Sf3b1 K700E/+ cells showed a higher BrdU incorporation and a higher G1/S fraction than Sf3b1 +/+ cells, before and after induction (Fig. 5g).Concomitantly to GATA1 induction, when a mild replication stress was imposed by inducing a cell cycle arrest in early S-phase with Erythroid precursors were expanded in culture from SF3B1 MUT , SF WT MDS and controls samples.a Schematic representation of the protocol.b Erythroid differentiation evaluated on May-Grünwald Giemsa-stained cytospins.Histograms representing the proportion of erythroid precursors in up to 7 controls, 11 SF3B1 MUT and 7 SF WT independent samples at days (d)7-8, 9-10, 11-13 and 14-16.Results are expressed as means ± standard error of the mean.2-way ANOVA for multiple comparisons.Controls versus SF3B1 MUT , P = 0.017; controls versus SF WT , P = 0.012.c Variant allele frequencies of SF3B1 mutation in erythroblasts at d7, d11-13 and d14-15 of 14 independent SF3B1 MUT samples.d Volcano plot representing up-and downregulated transcripts in SF3B1 MUT basoE and polyE compared to SF WT ones.Twosided Wald-test and BH-adjusted P value < 0.05.e Venn diagram representing the numbers of differentially expressed genes between SF3B1 MUT and SF WT samples at basoE and polyE stages.f Gene Ontology (GO) enrichment analysis of up-and down-regulated genes in SF3B1 MUT versus SF WT erythroblasts.Fisher's exact test corrected by false discovery rate (FDR) < 0.05.Specific terms to basoE or polyE as blue or red bars, respectively, shared terms as violet bars.g Gene set Enrichment Analysis (GSEA) showing terms deregulated in SF3B1 MUT basoE.h Barplots representing numbers and types of differential splicing events in SF3B1 MUT  aphidicolin, Sf3b1 K700E/+ compared to Sf3b1 +/+ cells exhibited a significantly higher G1/S phase fraction (Fig. 5h, right panel).Inhibition of dNTP biosynthesis by 0.2 mM hydroxyurea (HU) for 16 h, normalized BrdU incorporation in Sf3b1 K700E/+ cells after induction (Fig. 5i).In metabolomic analysis, the quantities of dATP, dCTP and dTTP before induction were equivalent in Sf3b1 K700E/+ to Sf3b1 +/+ cells, showing that DNA replication stress was not related to nucleotide pool depletion (Fig. 5j).After induction of differentiation with estradiol, dNTP quantities increased in Sf3b1 +/+ cells, consistent with a slowdown of DNA synthesis.In Sf3b1 K700E/+ cells, lower dNTP quantities argued for persistent DNA synthesis (Fig. 5j; Supplementary Fig. 9c).
Upon induction of differentiation, more Sf3b1 K700E/+ cells were positive to p-Rpa32s4/s8 labelling with a higher sensitivity to HU treatment than Sf3b1 +/+ -cells (Fig. 5k).In estradiol-treated cells, Western blot confirmed the engagement of Rpa, but did not show phosphorylation of Chk1 suggesting that the Atr-Chk1 pathway was not activated (Fig. 5l).Finally, the delayed differentiation of Sf3b1 K700E/+ murine erythroblasts can be partially rescued by lowering their high rate of DNA synthesis with HU at the expense of cell viability (Supplementary Fig. 9d, e).

Targeting of R-loops improves the differentiation of human SF3B1 MUT erythroblasts
To establish a link between the loss of R-loops and the phenotypic characteristics of fork velocity and replication stress in human primary SF3B1 MUT erythroblasts, we thought to modulate the level of R-loops in the cell.Previous works have established that the THO complex which contributes to prevent R-loop accumulation interacts with SIN3Ahistone deacetylase complex.Furthermore, inhibiting histone deacetylase activity by depleting SIN3A or treating the cells with trichostatin A stabilizes R-loops 40 .We used pan-HDAC inhibitor Suberoylanilide hydroxamic acid (SAHA)/vorinostat (further denoted HDACi) in DRIPseq experiment.Human erythroblasts from 3 SF3B1 MUT , 3 SF3B1 WT (1 SRSF2, 1 SRSF2/biTET2 or 1 NRAS/biTET2 mutations), and 4 controls samples were pre-treated with HDACi 0.5 μM for 20 h, at day11 of culture.The numbers of shared peaks increased in 3/4 controls, even not significantly.HDACi treatment restored the level of R-loops of 2/3 SF3B1 MUT erythroblast samples up to normal (Fig. 6a; Supplementary Data 6).Unexpectedly, the numbers of R-loops counted in SRSF2 or TET2/NRAS mutated samples collapsed almost entirely (P < 0.001).In SF3B1 MUT samples, the number of shared R-loops was quantitatively important in the gene bodies.The augmentation of R-loops also affected intergenic regions, 3'UTR and TTS more than 5′UTR or promoter-TSS (Fig. 6b).We visualized the changes in R-loop profiles at specific loci.As shown in Fig. 6c and Supplementary Fig. 10, HDACi treatment produced large R-loops near the promoter of BCL2L1, PTPN11, ARPC3 and NCOA4 genes specifically in SF3B1 MUT cells.By contrast, HDACi did not change the profile of R-loops in SF3B1 MUT cells at HK1 locus (Supplementary Fig. 10).To verify whether, by rescuing Rloops, gene expression may change, we performed RT-qPCR at these 4 loci.Upon treatment with HDACi, the expression of BCL2L1 increased significantly in SF3B1 MUT .While the expression of NCOA4 and PTPN11 also tended to increase, HK1 did not (Fig. 6d).These data suggest the relationship between R-loops and gene expression in these cells.
Then we wondered whether the effect of HDACi could be at least partly due to a modification of IRR profiles.To explore this hypothesis, we selected 5 genes with known IRR (PPOX, PPM1A, COASY, S100A4, BCL2L1) and performed RT-PCR to visualize the IRR-transcripts and the spliced isoforms in 4 SF3B1 MUT , 6 SF3B1 WT and 2 control erythroblast samples.HDACi did not modify the pattern of transcripts in controls and SF3B1 WT samples.The abundance of IRR-transcripts in SF3B1 MUT remained similar in the presence or absence of HDACi suggesting that the restoration of R-loops observed under this treatment did not depend on intron retention (Fig. 6e).We confirmed these results for BCL2L1 and COASY transcripts by fluorescent PCR fragment analysis (Supplementary Fig. 11a, b).
To evaluate the impact of R-loop restoration on fork progression, we performed DNA combing of 3 samples of erythroblasts treated or not with 0.2 μM HDACi (1 control, 2 SF3B1 MUT including one with a frameshift mutation in the histone acetyltransferase EP300).The fork velocity in SF3B1 MUT erythroblasts was high and decreased significantly after HDACi treatment.By contrast, neither the SF3B1 MUT /EP300 MUT sample nor the control showed variations of fork velocity after HDACi (Fig. 7a).To assess DNA synthesis in a larger number of patients, we performed BrdU assays.The percentage of S-phase cells decreased significantly in SF3B1 MUT cells.However, the BrdU intensity higher in SF3B1 MUT samples did not change (Fig. 7b).Then, we verified the impact of HDACi (0.5 μM 20 h) on the frequency of p-RPA32 s33 and γH2AX foci.No increase of positive cells was observed in control, SF3B1 MUT or SF3B1 WT erythroblast samples suggesting that HDACi at the concentration used did not provide DNA damage to these cells (Fig. 7c-e).
Finally, we studied the consequences of HDACi treatment on erythroid cell differentiation.At progenitor level, HDACi did not change the number or size of the BFU-E type colonies formed in methylcellulose (Fig. 7f).Looking at mutations in single colonies we did not observe any clonal selection during the 14 days of semi-solid culture (Supplementary Fig. 11c).By contrast, HDACi drastically forced the maturation of erythroblasts at late stage as shown by a significant increase of mature erythroblasts (Fig. 7g, h).This effect was also confirmed by the increase of GPA + CD49d low cell proportion corresponding to orthochromatic erythroblasts (Fig. 7i).Interestingly we did not observe any inhibitory effect of HDACi at 0.5 μM on the overall rate of expansion of SF3B1 MUT erythroid precursors compared to SF3B1 WT or control erythroblasts (Supplementary Fig. 11d), or increase of cell death in SF3B1 MUT , SF3B1 WT , and control cell cultures (Fig. 7j).Altogether, these results showed that HDACi by producing R-loops, slowed DNA replication and facilitated erythroid cell differentiation without altering cell proliferation.

Discussion
The present study shows that SF3B1 mutations causing an increased proliferation of immature erythroblasts with a reduced capacity to terminal differentiation trigger a replication stress with ssDNA exposure in erythroid cells.Accelerated DNA replication fork velocity is observed when R-loops are lost.HDAC inhibition restores R-loops and decreases fork speed, which correlates with erythroid differentiation improvement.These features distinguish SF3B1 mutation from other splicing factor mutations.
Alternative splicing of pre-mRNA contributes to physiological hematopoiesis [26][27][28]41 . Inton retentions increase along terminal steps of erythroid differentiation 28 .Intron retention modulates gene fragments overlapping with peaks between SF3B1 MUT and control samples (left panel) and SF3B1 MUT and SF3B1 WT samples (right panel) with log 2 (FC) >|1| using twosided Wald-test and a BH-adjusted P value < 0.05. h istribution to gene features of differential R-loops in SF3B1 WT samples and lost in SF3B1 MUT samples.i Venn diagram showing overlap between genes that lost one R-loop and genes with intron retention reduction (IRR) in SF3B1 MUT erythroblasts.j DRIP-seq and RNA-seq overlays at RAD9A and IQGAP3 loci showing R-loop losses and IRR events in SF3B1 MUT erythroblasts.Gene structures using GENCODE GRCh37.k DRIP-qPCR analysis of 4 controls, 3 SF3B1 WT including 1 U2AF1 MUT designated as green dot and 4 SF3B1 MUT samples.Enrichment signals (normalized to input) at specific loci were normalized to EGR1 (no R-loop).RPL13A and TFPT as positive controls.In box plots, central lines represent medians, bounds represent lower and upper quartiles and whiskers correspond to min-max values.Two-sided unpaired t-test for P values (see Suppl informations).b, e, j RPM: reads per million.* P < 0.05; ** P < 0.01; **** P < 0.0001; ns: not significant.Source data are provided as a Source Data file.
expression by generating transcripts that are either detained in the nucleus or degraded in the cytoplasm by the NMD 41,42 .Here, we show that SF3B1 MUT promotes a reduction of intron retention in erythroblasts and that the number of retained introns lost in SF3B1 MUT -erythroid precursors increases between basoE and polyE.By reversing a physiological process, SF3B1 mutation changes gene expression profile and reshapes the proteome, affecting several pathways such as DNA replication, DNA repair mainly base excision repair and nucleotide biosynthesis.SF3B1 MUT -erythroblasts retain proliferative capacities, which contributes to enrichment of the bone marrow in immature erythroblasts and to defective production of mature erythroblasts, defining ineffective erythropoiesis.
To synthesize DNA properly, the replication machinery must overcome several obstacles, including R-loops and transcription complexes 37 .Most of the R-loops located at promoter-TSS, in gene bodies and intergenic regions in SF3B1 WT -erythroblasts were lost in SF3B1 MUT -erythroblasts, in contrast with the augmented R-loops detected in SRSF2 MUT erythroblasts.Previous studies in cell lines or in primary CD34 + SF MUT progenitors using single-cell imaging with S9.6 antibody have shown that not only SRSF2 MUT or U2AF1 MUT , but also SF3B1 MUT cells may produce undesirable R-loops [19][20][21] .However, S9.6 foci may indicate RNA:DNA hybrids and also double-stranded RNA which are more abundant, making difficult the quantification of R-loops even when RNaseH1 is overexpressed in cell lines to assess the specificity of the signals 19,43,44 .Here we used DRIP-seq to obtain a genome-wide landscape of R-loops.This technique allowed us to identify short type I R-loops located at GC-skewed promoter-proximal regions, or in intergenic regions that may contain active enhancers 21 , which are preferentially detected by R-ChIP, and also, large type II R-loops (spanning over 300 bp to 1 kb) distributed along the body of transcribed genes 45,46 .The decreased number of R-loops at promoters-TSS and gene bodies in SF3B1 MUT cells suggest that both large and short R-loops are lost.As the GC skew characteristic of R-loops at promoter progressively diminishes after the first exon/intron junction, the mechanism of R-loop formation may be different in gene bodies 45 .Intron retention, by increasing homology between nascent RNA and its DNA template can initiate the co-transcriptional formation of large R-loops spreading over the gene coding sequence 22,29,47,48 .Conversely, the binding of splicing machinery making intron excision hinders R-loop accumulation 30,33,49 .Increased intron excision occurring when SF3B1 is mutated may suppress R-loops along the gene bodies.We report here that most of IRR overlap with lost R-loops.However, because R-loops were lost also in promoter-TSS, intergenic regions and TTS, IRR is not the unique mechanism for R-loop loss in these cells.
The presence of unscheduled R-loops at promoter-TSS as a consequence of RNA polymerase II pausing usually correlates with high gene expression 45 .R-loop-positive regions overlap with DNase I-hypersensitive regions, indicative of open chromatin 21,45 .Our integrative analysis of R-loops and gene expression associated the loss of R-loops with low transcript level in SF3B1 MUT erythroblasts.Importantly, we identified BCL2L1, a GATA1 and STAT5 target gene, which expression increased, without splicing changes, when R-loops formed near its promoter under HDACi treatment.This is consistent with the interference of R-loops with transcription.In addition, since SF3B1 protein together with U2AF1 and SRSF2 were detected in the vicinity of R-loops at promoters 50 , the SF3B1 mutant protein, could notably modify the kinetics of transcription elongation as generally reported for splicing factors 51 .
Replication stress that appears at the early stages of malignant cell transformation, has been linked to events that impair DNA synthesis integrity such as reduced or increased origin firing, nucleotides or replication factor depletion and replication-transcription conflicts 52,53 .Replication stress can produce stalled forks, under-replicated DNA, or supra-acceleration of forks 54 .As opposed to SRSF2 MUT or U2AF1 MUT cells in which fork progression is slowed down, we detected an accelerated fork speed when R-loops were lost in SF3B1 MUT -erythroblast.Alternative causes of increased fork velocity were associated with overexpression of oncogenes, downregulation of mRNA biogenesis, or PARP1 inhibition [55][56][57][58] .Moreover, activation of oncogenic HRAS in presenescent cells was shown to accelerate forks by inducing overexpression of topoisomerase 1 (TOP1), which is known to resolve unwanted R-loops 59 .TOP1 was heavily expressed in SF3B1 MUT and SF3B1 WT erythroblasts, while senataxin and THO complex proteins, other R-loop modulators were specifically upregulated in SF3B1 MUT erythroblasts and could contribute to fork velocity 60 .Furthermore, if the transcription rate decreases when R-loops are resolved, not only R-loop loss but also the reduction of transcription-replication conflicts could prevent replication fork stalling 61,62 .
The detection of p-RPA32 foci without DNA damage marks γH2AX or 53BP1 suggested replicative ssDNA gaps.ATR continuously monitors the recruitment of RPA32 on ssDNA within the replisomes and its phosphorylation on serine 33, independently of CHK1 63 .SF3B1 MUTerythroblasts endure a mild replication stress without engagement of the CHK1 pathway (Fig. 5l).In cancer cells, the tolerance to replication stress is supported by the overexpression of the upstream components of ATR-CHK1 pathway, Clapsin and Timeless, independently of ATR signaling 64 .As SF3B1 MUT -cells overexpressed Claspin and Timeless genes and Timeless protein, we cannot exclude their role in the mechanism of replication stress tolerance.Alternatively, RPA could bind the displaced ssDNA of R-loops and recruit RNaseH1 to facilitate its resolution 65 .Further studies are needed to test these hypotheses.

RNA-sequencing
RNA-seq data from a first cohort of BM MNC from 27 lower-risk (LR)-MDS (21SF3B1 MUT and 6 SF3B1 WT (4 SRSF2 MUT and 2 SF WT ) previously published 2 were re-analyzed.RNA-sequencing of two additional cohorts was performed: one cohort of BM MNC from 185 LR-MDS (74 SF3B1 MUT , 30 SRSF2 MUT , 11 U2AF1 MUT , 70 triple-negative samples) and one cohort of basoE and/or polyE obtained from 13 MDS (8 SF3B1 MUT and 5 SF3B1 WT ).RNA integrity (RNA integrity number ⩾ 7.0) was checked on the Agilent Fragment Analyzer (Agilent, Santa Clara, CA) and quantities were determined using Qubit (Invitrogen, Waltham, CA).50-100 ng of total RNA sample was used for poly-A mRNA selection using oligo(dT) beads and subjected to thermal fragmentation.For BM MNC, MuLV Reverse Transcriptase (Invitrogen) was used for cDNA synthesis.Libraries were constructed using the TruSeq Stranded mRNA Sample Preparation Kit (Illumina, San Diego, CA) and sequenced on an Illumina HiSeq 2500 platform using a 100-bp pairedend sequencing strategy.For RNA-sequencing of erythroblasts, fragmented mRNA samples were subjected to cDNA synthesis, converted into double stranded DNA using SureSelect Automated Strand Specific RNA Library Preparation Kit.Libraries were bar-coded, and subjected to 100-bp paired-end sequencing on Novaseq-6000 sequencer (Illumina, San Diego, CA).For RNA-sequencing of murine G1E-ER4 erythroblasts, libraries were constructed using the TruSeq Stranded mRNA Sample Preparation Kit (Illumina) and sequenced on an Illumina NextSeq500 platform (Illumina) using a 75-bp paired-end sequencing strategy.

Bioinformatic analysis of RNA-sequencing
FASTQ files were mapped using STAR (v2.7.9) to align the reads against the human reference genome GRCh37 (UCSC version hg19) downloaded from the GENCODE project website.Reference genome and annotations are available on Gencode (https://www.gencodegenes.org/ human/release_19.html) 73.Read count normalizations and groups comparisons were performed using DESeq2 (v1.30.1), with the Wald test for significance testing.Genes with low counts were filtered, if at least half the samples have less than 20 normalized reads.Outliers removal with Cook's distance was left at default.For differential expression study, the results obtained after DESeq2 comparison were selected for further analysis and filtered at Benjamini-Hochberg (BH)adjusted P value < 0.05 and log2(FC) > | 1 | 74 .To identify differentially expressed splicing events, we used KisSplice (v2.6.2),KisSplice2refgenome (v2.0.7) and KissDE (v1.15.3) 75,76 , and filtered results at BHadjusted P value < 0.05 and ΔPSI > |0.10 | .Biological processes associated with genes that were differentially expressed or spliced were determined using the Gene Ontology enrichment or overrepresentation analyses and Gene Set Eenrichment analysis with reference to specific gene sets (Supplementary Data 7).
(Promega, Madison, Wi) in 50 mM Tris/HCl pH8.5 buffer.Peptides were recovered by filtration, desalted on C18 reverse phase StageTips and dried.They were then separated in 5 fractions by strong cationic exchange StageTips and analyzed using an Orbitrap Fusion mass spectrometer (Thermo Fisher Scientific).Peptides from each fraction were separated on a C18 reverse phase column (2 μm particle size, 100 A pore size, 75 μm inner diameter, 25 cm length) with a 170 min gradient starting from 99% of solvent A containing 0.1% formic acid in milliQ-H 2 O and ending in 55% of solvent B containing 80% acetonitrile and 0.085% formic acid in milliQ-H 2 O.The mass spectrometer acquired data throughout the elution process.The MS1 scans spanned from 350 to 1500 Th with 1.10 6 Automatic Gain Control (AGC) target, 60 ms maximum ion injection time (MIIT) and resolution of 60 000.MS Spectra were recorded in profile mode.High energy Collision Dissociation (HCD) fragmentations were performed from the most abundant ions in top speed mode for 3 seconds with a dynamic exclusion time of 30 s. Precursor selection window was set at 1.6Th.HCD Normalized Collision Energy was set at 30% and MS/MS scan resolution was set at 30000 with AGC target 1.10 5 within 60 ms MIIT.

Proteomic data analysis
The mass spectrometry data were analyzed using Maxquant version 2.1.1.0 78.The database used was a concatenation of Human sequences from the Uniprot-Swissprot database (Uniprot, release 2022-05) and the list of contaminant sequences from Maxquant.Cystein carbamidomethylation was set as constant modification and acetylation of protein N-terminus and oxidation of methionine were set as variable modifications.Second peptide search and the "match between runs" (MBR) options were allowed.False discovery rate (FDR) was kept below 1% on both peptides and proteins.Label-free protein quantification (LFQ) was done using both unique and razor peptides with at least 2 peptides ratios required for LFQ.
Statistical analysis was done using "R".Among identified proteins, those with at least 70% of values in at least one condition were selected.Then, proteins with t-test P value < 0.05 were defined as significantly differentially expressed, and proteins with 100% of valid values in one condition and 0% of valid values in the other condition were defined as "Appeared" or "Disappeared".Log2(LFQ intensity) matrix were filtered before imputation.For heatmaps, Z-score were calculated after imputation.
Functional analyses were generated through Ingenuity Pathway Analysis (QIAGEN Inc., https://www.qiagenbioinformatics.com/products/ingenuitypathway-analysis) version 76765 844 for each list of differential proteins.Significantly over-represented biological terms (Canonical pathways or Diseases and functions) were identified with a right-tailed Fisher's Exact test that calculates an overlap P value determining the probability that each term associated with our lists of differential proteins was due to chance alone.The z-score is a statistical measure of correlation between relationship direction and experimental protein expression.Its calculation assessed the activation (positive z-score) or repression (negative one) of each term.To be considered significant the z-score has to be greater than 2 in absolute value.Overlap of selected pathways and functions was designed using Cytoscape version 3.9.1.
Percentage of input expected between 1-15% %input = 100 × 2 ðCt input corrected À Ct DRIPed DNAÞ where Ct ðcycle thresholdÞ input ðcorrectedÞ = ðCt input À log 2ð10ÞÞ and fold enrichment (expected between 20-300) Fold enrichement = ½2 ðCt input ðpositive locus correctedÞÞ À Ct DRIPed ðDNA positive locusÞ = ½2 ðCt input negative locus ðcorrectedÞÞ À Ct DRIPed DNA ðnegative locusÞ were calculated to assess the immunoprecipitation efficiency and specificity, respectively.For DRIP-seq, good quality DRIPed DNA samples (with and without RNase H1-treatment) were sonicated to get an average length of 200-300 bp.Then samples were subjected to endrepair, dATP tailing, adaptor ligation and library indexing.Lastly, the libraries were cleaned up using AMPure beads, and amplified.After checking library quality on Agilent Bioanalyzer using Agilent High Sensitivity DNA 1000 kit, libraries were sequenced on Illumina Nova-Seq instruments.

DRIP-seq bioinformatic analysis
100 bp paired-end reads were trimmed and filtered using fastp (v0.23.4) to remove low quality reads, low complexity sequences, as well as polyG tails (corresponding to no signal in the Illumina two-color systems, in NovaSeq data).Reads were mapped to the human reference genome (GENCODE, GRCh37; https://www.gencodegenes.org/human/release_19.html)with Bowtie2 (v2.3.5.1), using parametersnodiscordant andno-mixed.Resulting SAM files were piped through Samblaster (v0.1.24)to remove duplicate reads, then through Samtools (v1.10) to generate sorted (by coordinates) BAM files, with a cutoff for MAPQ score of 10.Peaks were called for each replicate with the MACS algorithm (MACS3) in broad mode, with the input DNA as control as well as the same sample treated with RNase H1, at q value < 0.1.Resulting peaks were analysed by groups of biological replicates with MSPC (v5.5.0) 80 .Firstly, by categorizing them as either background, weak, or stringent (with both cut-off on P value at 1e-4 and 1e-8).Weak peaks were rescued if stringent in other biological replicates.Peaks were then confirmed or discarded based on the combined stringency test supported by enough replicates and if their combined stringency, using Fisher's combined probability test, satisfies the threshold of 1e-8.Confirmed peaks are qualified as true positive if they pass the Benjamini-Hochberg (BH) multiple testing correction at level 0.05.
To identify differentially expressed R-loops, we intersected the peaks from MACS3 calling with the 5.5 million of restriction fragments generated before S9.6 immunoprecipitation, using the resulting matrix as a reference frame for featureCounts (v2.0.0).Normalization and differential R-loop expression analysis between wild-type, mutant and control samples was performed with DESeq2, using BH-adjusted P value < 0.05 and log2 (FC) > | 1 | .Bedtools (v2.27.1) intersect was used for overlap analyses.

Cell cycle analysis and live cell imaging
Cell cycle was analysed by double labelling using a fluorescent anti-BrdU antibody and 7-aminoactinomycine D (7-AAD, BD Biosciences).Cells were incubated with 10 µM BrdU for 30 min at 37 °C (BD Biosciences).Then cells were pelleted, fixed in 500 µL 70% ethanol for 20 min at room temperature and washed in 1X PBS supplemented with 0.5% bovine serum albumin (BSA).Cell pellet is resuspended in 2 M chlorhydric acid for 20 min at room temperature and washed in 1X PBS.Acid was neutralized by a 0.1 M borate solution pH 8.5 for 2 min (Borax Na 2 B 4 O 7 , Merck).Cells were washed 3 times and transferred to 96-well plates for incubation with 10 µL anti-BrdU antibody or isotype control for 20 min at room temperature either FITC-anti-BrdU (BD Biosciences, clone 3D4) or APC-anti-BrdU (BD Biosciences, clone 3D4) at a final concentration of 0.5 μg/mL.Cells were washed 3 times and incubated with 50 µL of a 10 µg/mL 7-AAD and 50 µg/mL RNase A (Macherey-Nagel, Hoerdt, France) solution for 30 min at room temperature.Finally, cells were transferred into Eppendorf vials containing 100 µL 1X PBS and analysed on BD Accuri C6 flow cytometer using CFlow Plus software (BD Biosciences) (Supplementary Method 4).The proliferation capacities of G1E-ER4 cells treated or not with β-estradiol at 10 −7 M (Merck) were measured using IncuCyte live-cell imaging system (Essen Instruments, Ann Arbor, MI).48-well plates were coated with 0.01% Poly-L-lysine sterile-filtered solution (Merck).Cells were seeded at 30,000 cells/well and monitored over time with 9 images per well every 3 hours for 72 h.Results were expressed as cell confluence (in % of occupied space).
DNA fiber combing DNA replication was analyzed by pulse labelling with fluorescent thymidine analogs of human erythroblasts.Cells were first labelled with 20 µM iododesoxyuridine (IdU) for 30 min at 37 °C and then with 100 µM chlorodesoxyuridine (CldU) for 30 min at 37 °C.DNA replication was blocked by the addition of an excess of thymidine (300 µM) on ice and cells were washed in 1X PBS and counted.DNA combing was performed after DNA extraction in 1% low melting agarose plugs (0.3 × 10 6 cells in 90 µL per plug).Briefly, low melting agarose was maintained at 45 °C.Cells were resuspended at 6.66 × 10 6 in one volume of 1X PBS, mixed in the same volume of 2% agarose and distributed in plug mold.Then proteins in agarose plugs were digested with 1 mg/mL proteinase K in 0.25 M EDTA pH 8/ 1% SDS at 42 °C for 48 h.Finally, plugs were washed 3 times in 10 mM Tris-HCl pH 8/1 mM EDTA (TE), and agarose was eliminated by digestion using β-agarase for 48 h.For DNA fiber stretching, extracted DNA was resuspended in 0.25 M 2-(N-morpholino)-ethanosulfonic acid pH 5.5 in Eppendorf vials for 30 min at 65 °C, 30 min at room temperature and 2 weeks at 4 °C.Before combing, vials were placed at room temperature for 30 min.DNA was then placed in FiberComb reservoir (Genomic Vision, Bagneux, France) and stretched on silane treated coverslips.Coverslips were sticked on glass slides and incubated for 2 h at 60 °C and stored at -20 °C.For hybridization, DNA was denatured in a 1 N NaOH solution, rinsed in cold 1X PBS and dehydrated in 70%, 85% and 100% ethanol.Aspecific sites were blocked for 30 min at 37 °C.DNA was first incubated with primary anti-IdU (BD Biosciences, cat no.347580, dilution: 1/50) mouse antibody and anti-CldU (Abcam, cat no.Ab 6326, dilution: 1/50) rat antibody, and then with fluorescent secondary antibodies.Finally, DNA was labelled with a primary anti-ssDNA antibody (mouse), and with a first fluorescent secondary anti-mouse antibody (goat), and with a second fluorescent secondary anti-goat antibody (donkey).Slides were mounted in Vec-taShield medium (Vector Laboratories, Burlingame, CA).All antibodies and reagents are described in Supplementary Methods 5. Fiber length and symmetry of a minimum of 200 fibers per group were measured.Fork symmetry was expressed as the IdU/CldU ratio.The speed of the replication fork was calculated by the ratio (d I + d Cl ) / (t I + t Cl ), where d I and t I represent respectively the measured distance (in kb) and labelling time (in min) for IdU incorporation, and d Cl and t Cl denote the corresponding parameters for CldU incorporation.

Western blot analysis
Cell lysates were solubilized for 5 minutes at 95 °C in Laemmli buffer (65 mM Tris [pH 6.8], 20% glycerol, 5% β-mercaptoethanol, 0.01% bromophenol blue, and 2% sodium dodecyl sulfate [SDS]) Proteins were separated by SDS-polyacrylamide gel electrophoresis and transferred to a nitrocellulose membrane (VWR, Radnor, PA).Membranes were blocked with 5% of dry milk in TBS-T buffer (10 mM Tris-HCl, pH 7.5, 150 mM NaCl, 0.15% Tween 20) for 1 h and incubated in specific antibody overnight at 4 °C (Supplementary Methods 7).Membranes were washed in TBS-T buffer and incubated for 1 h at room temperature with secondary horseradish peroxidase (HRP)-linked antibody (horse anti-mouse HRP-linked antibody 7076 S or goat antirabbit HRP-linked antibody 7074 V, Cell Signaling, Danvers, MA).Enzyme activity was visualized by an ECL-based detection system (VWR, Radnor, PA).Blot imaging was performed on the Fujifilm LAS-3000 Imager (Fujifilm, Tokyo, Japan) and images were analysed using the Multi Gauge software (Fujifilm).
Targeted LC-MS metabolomics analyses 3.10 5 Sf3b1 K700E/+ or Sf3b1 +/+ G1E-ER4 cells were collected after 0 h or 24 h of β-estradiol treatment and with or without 0.2 mM HU for 16 h (Merck).For metabolomics analysis, extraction was performed in 30 μL of 50% methanol, 30% acetonitrile (ACN) and 20% water.After centrifugation at 16,000 g for 15 min at 4 °C, supernatants were collected and stored at −80 °C until analysis.LC/MS analyses were conducted on a QExactive Plus Orbitrap mass spectrometer equipped with an Ion Max source and a HESI II probe coupled to a Dionex UltiMate 3000 uHPLC system (Thermo).Samples (5 µL) were injected onto a ZIC-pHILIC column with a guard column (Millipore) for LC separation in a gradient of buffer A (20 mM ammonium carbonate, 0.1% ammonium hydroxide pH 9.2), and buffer B (ACN) with a flow rate of 0.200 µL.min −1 as follows: 0-20 min, linear gradient from 80% to 20% of buffer B; 20-20.5 min, linear gradient from 20% to 80% of buffer B; 20.5-28 min, 80% buffer B. The mass spectrometer was operated in full scan, polarity switching mode with the spray voltage set to 2.5 kV and the heated capillary held at 320 °C.The sheath gas flow was set to 20 units, the auxiliary gas flow to 5 units and the sweep gas flow to 0 units.The metabolites were detected across a mass range of 75-1000 m/z at a resolution of 35,000 (at 200 m/z) with the automatic gain control target at 10 6 and the maximum injection time at 250 ms.Lock masses were used to ensure mass accuracy below 5 ppm.Data were acquired with Thermo Xcalibur software (Thermo Fisher Scientific).The peak areas of metabolites were determined using Thermo TraceFinder software (Thermo Fisher Scientific), identified by the exact mass of each singly charged ion and by the known retention time on the HPLC column.Each metabolite was quantified as the area under the curve and results were expressed as arbitrary unit (A.U).

RT-PCR and PCR on colonies
For RT-PCR, RNA was extracted using RNAeasy Mini Kit (Qiagen) and retrotranscribed with the Maxima First Strand cDNA synthesis kit (Thermo Fischer Scientific).cDNA was amplified using Phire Hot Start II DNA polymerase (Thermo Fischer Scientific).Amplicons were analysed by electrophoresis on a 2% agarose gel.For fluorescent RT-PCR, the PCR was performed using the same primers except that the forward primers were labelled with 6-carboxyfluorescein (6-FAM) at 5' end.For capillary electrophoresis, 1 μL of diluted PCR product was added to 0.2 μL of GeneScan 500 ROX dye standard and 18 μL of RNAse-free water.After denaturation for 5 min at 95 °C, fragments were separated using the 3730xl DNA analyzer and analyses was performed using GeneMapper Software 5 (Thermo Fischer Scientific).For qPCR, the primers were used with the SYBRGreen Master Mix (Meridian Bioscience) in a LightCycler480 (Roche).Expression levels were normalized to actin (ACT) and cyclophilin A (PPIA) expression using geometric averaging, and analyzed using ΔΔCt method.Primers for RT-PCR, fluorescent RT-PCR and qPCR are listed in Supplementary Methods 8.

Statistical analysis
For quantitative variables, values were expressed as median and interquartile range (IQR) or means and standard error of the mean (SEM) and compared using the Student t-test or non parametric Mann-Whitney or Kruskal-Wallis tests.Chi-squared or Fisher exact tests were used to compare categorical variables.For transcript quantification, the Mann-Whitney test was used to assign a statistical significance for each group comparison.P values < 0.05 were considered significant (JMP version 10.0.2, SAS Institute Inc, Cary, NC).

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
The RNA-sequencing and DRIP-sequencing data generated in this study have been deposited in the NCBI's Gene Expression Omnibus (GEO) database.Accession codes are provided below.The processed RNA-sequencing and DRIP-sequencing data are available in the Supplementary information as Supplementary Data indicated for each of them.RNA-seq BM MNC cohort of 27 cases (Supplementary Data 1): GSE220525, RNA-seq of human basophilic erythroblasts and polychromatophilic erythroblasts (Supplementary Data 2): GSE220523.DRIP-seq of human basophilic erythroblasts (Supplementary Data 4): GSE220271.RNA-seq of mouse G1EER erythroblasts (Supplementary Data 5): GSE220516.RNA-seq data of the BM MNC cohort of 185 MDS patients (Supplementary Fig. 1b) are available at GSE220518 under restricted access since these data are considered sensitive personal data according to the European Union General Data Protection Regulation (GDPR) and thus cannot be shared with third-parties without prior approval.Access can only be granted for research purposes.An application must be sent to michaela.fontenay@inserm.fr.The proteomic data are available on ProteomeXchange Consortium via the PRIDE partner repository: Human erythroblast proteome (Supplementary Data 3) and mouse erythroblast proteome (Supplementary Data 5): PXD038700.Source data are provided with this paper.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons.org/ licenses/by/4.0/.

Fig. 2 |
Fig.2| Transcriptomic and proteomic features of SF3B1 MUT bone marrowderived basophilic (basoE) and polychromatophilic (polyE) erythroblasts.Erythroid precursors were expanded in culture from SF3B1 MUT , SF WT MDS and controls samples.a Schematic representation of the protocol.b Erythroid differentiation evaluated on May-Grünwald Giemsa-stained cytospins.Histograms representing the proportion of erythroid precursors in up to 7 controls, 11 SF3B1 MUT and 7 SF WT independent samples at days (d)7-8, 9-10, 11-13 and 14-16.Results are expressed as means ± standard error of the mean.2-way ANOVA for multiple comparisons.Controls versus SF3B1 MUT , P = 0.017; controls versus SF WT , P = 0.012.c Variant allele frequencies of SF3B1 mutation in erythroblasts at d7, d11-13 and d14-15 of 14 independent SF3B1 MUT samples.d Volcano plot representing up-and downregulated transcripts in SF3B1 MUT basoE and polyE compared to SF WT ones.Twosided Wald-test and BH-adjusted P value < 0.05.e Venn diagram representing the numbers of differentially expressed genes between SF3B1 MUT and SF WT samples at basoE and polyE stages.f Gene Ontology (GO) enrichment analysis of up-and down-regulated genes in SF3B1 MUT versus SF WT erythroblasts.Fisher's exact test

Fig. 3 |
Fig.3| Loss of R-loops overlaps intron retention reductions in SF3B1 MUT human primary erythroblasts.DRIP-sequencing.a Violin plots showing the shared peak numbers in 4 controls, 5 SF3B1 MUT and 6 SF3B1 WT (4 SF WT , 2 SRSF2 MUT ).Two-sided unpaired t-test.Controls versus SF3B1 MUT , P = 0.011; SF3B1 WT versus SF3B1 MUT , P = 0.015.b DRIP-seq profiles ± RNaseH1 (RNH1) of a 50-kb region on chr7 showing the distribution of R-loops in reads per million.c Pie charts representing localizations of shared R-loops at gene features.d Proportion of shared peaks at gene features in SF3B1 MUT samples relative to SF3B1 MUT + SF3B1 WT samples.e DRIP-seq profiles showing R-loops near SUZ12 promoter.f Comparison of the expression of genes with overlapping R-loops at TSS, gene body, TTS and 3'UTR in SF3B1 WT samples and without overlapping R-loops in SF3B1 MUT samples.Violon plots represent the difference of mean expression intensity between SF3B1 WT and SF3B1 MUT samples (d11).Central lines represent the means.Gene numbers in each category are indicated.One sample Wilcoxon signed rank test is used for comparison of actual mean to theorical mean.TSS, P < 0.0001; gene body, P = 0.031; TTS, P = 0.179; 3'UTR, P = 0.118.g Volcano plot representing differential restriction

Table 1 |
Clinical and biological characteristics of patients according to SF3B1 gene status Unpaired t-test to compare quantitative values and Fisher 's exact test to compare categorical variables.SLD single lineage dysplasia, MLD multilineage dysplasia, RS ring sideroblasts, EB1 excess of blasts type 1 (5 to 9% bone marrow blasts), IPSS-R International Scoring System-Revised, no number, IQR interquartile range, NA not available.
versus SF3B1 WT basoE and polyE with ΔPSI > | 0.10| using two-sided Wald-test and BH-adjusted P value < 0.05.Bars over 0 indicate events upregulated and bars under 0 indicate events downregulated in SF WT erythroblasts.i Venn diagram of intron retention reductions (IRR) in SF3B1 MUT basoE and polyE.j.GO terms overrepresented among genes with IRR in SF3B1 MUT basoE and polyE.Fisher's exact-test corrected by FDR < 0.05.k Volcano plots representing differentially expressed proteins in SF3B1 MUT versus SF WT samples at proE/basoE and polyE stages (Wald-test, BHadjusted P value < 0.05).l Cytoscape representation of Ingenuity Pathway Analysis showing deregulated pathways in SF3B1 MUT versus SF3B1 WT samples (P values < 0.05 by Student t-test) either basoE-specific (blue dots), polyE-specific (red dots) or shared (violet dots).Scale: dot size proportional to -log10 (adjusted-P value).Source data are provided as a Source Data file.